English Conversational Telephone Speech Recognition by Humans and Machines

نویسندگان

  • George Saon
  • Gakuto Kurata
  • Tom Sercu
  • Kartik Audhkhasi
  • Samuel Thomas
  • Dimitrios Dimitriadis
  • Xiaodong Cui
  • Bhuvana Ramabhadran
  • Michael Picheny
  • Lynn-Li Lim
  • Bergul Roomi
  • Phil Hall
چکیده

Word error rates on the Switchboard conversational corpus that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues: what is human performance, and how far down can we still drive speech recognition error rates? In trying to assess human performance, we performed an independent set of measurements on the Switchboard and CallHome subsets of the Hub5 2000 evaluation and found that human accuracy may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. We also report on our own efforts in this area, presenting a set of acoustic and language modeling techniques that lowered the WER of our system to 5.5%/10.3% on these subsets, which is a new performance milestone (albeit not at what we measure to be human performance). On the acoustic side, we use a score fusion of one LSTM with multiple feature inputs, a second LSTM trained with speaker-adversarial multi-task learning and a third convolutional residual net (ResNet). On the language modeling side, we use word and character LSTMs and convolutional WaveNetstyle language models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving English Conversational Telephone Speech Recognition

The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...

متن کامل

The IBM 2015 English conversational telephone speech recognition system

We describe the latest improvements to the IBM English conversational telephone speech recognition system. Some of the techniques that were found beneficial are: maxout networks with annealed dropout rates; networks with a very large number of outputs trained on 2000 hours of data; joint modeling of partially unfolded recurrent neural networks and convolutional nets by combining the bottleneck ...

متن کامل

Acoustic variability in spontaneous conversational speech of american English talkers

Speaker variability strongly impacts human perception and technology performance, yet large-scale, systematic study of the acoustic characteristics involved is rarely undertaken. This study provides statistics on selected segmental and suprasegmental acoustic parameters from measures made on spontaneous conversational telephone speech from 160 speakers in the Switchboard Corpus. Since spontaneo...

متن کامل

2000 Nist Evaluation of Conversational Speech Recognition over the Telephone: English and Mandarin Performance Results

This paper documents the use of conversational telephone speech test materials in the NIST coordinated evaluation conducted early in 2000. The primary evaluation was of General American English speech, but a subsidiary evaluation of Mandarin speech was also offered. The primary test data consisted of twenty conversations collected for the original Switchboard Corpus but not released with the pu...

متن کامل

Improving Language Models for Mandarin Conversational Speech Recognition with Web Data

Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used webbased text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In addition, we investigate different techniques ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017